Storage system performing data deduplication, method of operating storage system, and method of operating data processing system

ABSTRACT

A storage system performing data deduplication includes a storage device configured to store data received from a host, and a controller configured to receive the data and an index associated with the data received from the host. The controller includes a memory configured to store mapping information and a reference count, the mapping information associating the index received from the host with a physical address of the storage system, the reference count associated with the index received from the host. The controller determines whether the data received from the host corresponds to a duplicate of data previously stored in the storage device by reading, from the memory, the mapping information and the reference count, the reading based on the index received from the host. The controller performs a deduplication process by updating the reference count if the data received from the host corresponds to the duplicate of data previously stored.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefits of U.S. Provisional Application No.62/425,686, filed on Nov. 23, 2016, in the USPTO, and Korean PatentApplication No. 10-2017-0031808, filed on Mar. 14, 2017, in the KoreanIntellectual Property Office, the disclosures of each of which areincorporated herein in their entireties by reference.

BACKGROUND

Inventive concepts relate to a storage system, and more particularly, toa storage system performing data deduplication, a method of operating astorage system, and a method of operating a data processing system.

Data deduplication techniques determine whether data to be stored in astorage system is already stored in the storage system. When datadeduplication techniques determine the data is already stored, the datais not stored in the storage system in duplicate and only a link toalready stored data is managed, and thus, storage space may beefficiently used. Since the deduplication techniques may improve theefficiency of use of storage systems, the deduplication techniques aredesired (for example, needed) for storage systems for a large amount ofdata.

However, in order to use the deduplication techniques, variousinformation such as data (or a hash index) and a data storage location(e.g., a logical/physical address) corresponding thereto is desired tobe managed, and thus, a problem that resources for managing informationfor deduplication increases may increase.

SUMMARY

Inventive concepts provide a storage system for reducing a burden ofmanaging information related to deduplication.

Inventive concepts provide a method of operating a storage system.

Inventive concepts also provide a method of operating a data processingsystem.

According to an example embodiment of inventive concepts, there isprovided a storage system including a storage device configured to storedata received from a host, and a controller configured to receive, fromthe host, the data and an index, the index associated with the datareceived from the host. The controller includes a memory configured tostore mapping information and a reference count, the mapping informationassociating the index received from the host with a physical address ofthe storage system, the reference count associated with the indexreceived from the host. The controller is configured to determinewhether the data received from the host corresponds to a duplicate ofdata previously stored in the storage device by reading, from thememory, the mapping information and the reference count, the readingbased on the index received from the host. The controller is configuredto perform a deduplication process by updating the reference count ifthe data received from the host corresponds to the duplicate of datapreviously stored in the storage device.

According to an example embodiment of inventive concepts, there isprovided a method of operating a storage system receiving, from a host,first data and a first index, the first index being associated with thefirst data, determining whether the first index is a same as an indexcorresponding to data previously stored in the storage system;performing a data deduplication by updating a reference count withoutwriting the first data in response to determining that the first indexis the same as the index corresponding to data previously stored in thestorage system, the reference count being previously stored in thestorage system; and providing the updated reference count to the host.

According to an example embodiment of inventive concepts, there isprovided a method of operating a data processing system including astorage system, the method including storing mapping information in thestorage system, the mapping information including a mapping between anindex generated using data from an external system and a physicaladdress indicating a storage location of the data, receiving in thestorage system a write request including additional data and an indexcorresponding to the additional data, determining, in the storagesystem, whether the additional data corresponds to a duplicate of dataalready stored in the storage system; and performing a deduplicationprocess by updating a reference count stored in the storage system ifthe additional data corresponds to the duplicate of data already storedin the storage system.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of inventive concepts will be more clearly understood fromthe following detailed description taken in conjunction with theaccompanying drawings in which:

FIG. 1 is a block diagram of a data processing system according to anembodiment of inventive concepts;

FIGS. 2 and 3 are block diagrams showing specific implementations of adata processing system;

FIG. 4 is a block diagram showing functions that are performed by a hostof a data processing system according to an embodiment of inventiveconcepts;

FIG. 5 is a block diagram of a storage system according to an embodimentof inventive concepts;

FIG. 6 is a block diagram showing an example of various modules storedin a working memory of FIG. 5 ;

FIGS. 7A and 7B are diagrams showing examples of information managed ina host and a storage system, according to embodiments of inventiveconcepts;

FIG. 8 is a block diagram showing an example of data writing and readingoperations of a data processing system according to an embodiment ofinventive concepts;

FIG. 9 is a flowchart showing a method of operating a host, according toan embodiment of an inventive concept;

FIG. 10 is a flowchart showing a method of operating a storage system,according to an embodiment of an inventive concept;

FIGS. 11 to 18 are diagrams showing examples of communication between ahost and a storage system in a data processing system according to anembodiment of inventive concepts; and

FIG. 19 is a block diagram of a network system including a server systemaccording to an embodiment of an inventive concept.

DETAILED DESCRIPTION OF THE EMBODIMENTS

Hereinafter, embodiments of inventive concepts will be described indetail with reference to the accompanying drawings.

FIG. 1 is a block diagram of a data processing system 10 according to anembodiment of inventive concepts.

Referring to FIG. 1 , the data processing system 10 may include a host100 and a storage system 200. The storage system 200 may include acontroller 210 and a storage device 220. According to an embodiment ofinventive concepts, the host 100, may include an index generator 110 andthe controller 210 of the storage system 200 may include an index table211. In the example of FIG. 1 , the index table 211 is shown as beingprovided in the controller 210. However, embodiments of inventiveconcepts are not limited thereto. For example, the index table 211 maybe stored in a memory outside the controller 210 in the storage system200.

The data processing system 10 may include storage media for storing dataupon request from an external system (e.g., a computing node). As anexample, the storage system 200 may include one or more solid statedrives (SSDs). When the storage system 200 includes an SSD, the storagesystem 200 may include a plurality of flash memory chips (e.g., NANDmemory chips) that store data based on a non-volatile scheme. Thestorage system 200 may include one flash memory device. The storagesystem 200 may include to a memory card including one or more flashmemory chips.

When the storage system 200 includes a flash memory, the flash memorymay include a two-dimensional (2D) NAND memory array or athree-dimensional (3D) or vertical NAND memory array. The 3D memoryarray is monolithically formed in at least one physical level of acircuit formed on or in a silicon substrate as a circuit related to theoperation of arrays, which includes memory cells having an active regiondisposed on the silicon substrate, or to the operation of the memorycells. The term “monolithically” means that layers of each level of thearray are stacked directly on layers of each lower level of the array.

In an example embodiment according to inventive concepts, the 3D memoryarray includes vertical NAND strings arranged in a vertical direction sothat at least one memory cell is located above another memory cell. Theat least one memory cell may include a charge trap layer.

U.S. Pat. Nos. 7,679,133, 8,553,466, 8,654,587, and 8,559,235, and U.S.Patent Application Publication No. 2011/0233648 disclose that a 3Dmemory array includes a plurality of levels and word lines and/or bitlines are shared between the levels, and are incorporated herein byreference in the present specification in their entirety.

As another example, the storage system 200 may include other varioustypes of memories. For example, the storage system 200 may include anon-volatile memory, such as magnetic RAM (MRAM), spin-transfer torqueMRAM, conductive bridging RAM (CBRAM), ferroelectric RAM (FeRAM), phaseRAM (PRAM), resistive RAM, nanotube RAM, polymer RAM (PoRAM), a nanofloating gate memory (NFGM), a holographic memory, a molecularelectronic memory, and/or an insulator resistance change memory.

The host 100 may perform management operations of data in the dataprocessing system 10. As an example, the host 100 may provide a datawrite or read request to the storage system 200. In addition, inresponse to a data erase request from the host 100, the storage system200 may perform an erase operation on data of an area indicated by thehost 100.

The host 100 may communicate with the storage system 200 via variousinterfaces. The host 100 may include various types of devices capable ofperforming data access to the storage system 200. For example, the host100 may be or may include an application processor (AP) thatcommunicates with the flash memory-based storage system 200. Accordingto an example embodiment, the host 100 may communicate with the storagesystem 200 via various interfaces, such as Universal Serial Bus (USB),MultiMediaCard (MMC), PCI-Express (PCI-E), AT Attachment (ATA), SerialAT Attachment (SATA), Parallel AT Attachment (PATA), Small ComputerSystem Interface (SCSI), Serial Attached SCSI (SAS), Enhanced Small DiskInterface (ESDI), and Integrated Drive Electronics (IDE).

According to an example embodiment, the data processing system 10 mayuse data deduplication techniques. In the case of the use of thededuplication techniques, when or if data requested to be written is aduplicate of (or is the same as) data already stored in the storagesystem 200, the processing for the write request may be completed bymanaging only a link to already stored data instead of storing the data,requested to be written, in duplicate. Accordingly, a storage space ofthe storage system 200 may be more efficiently used.

The index table 211 in the storage system 200 may be managed in order todetermine whether data requested to be written is duplicate data.According to an embodiment, the index generator 110 of the host 100 maygenerate an index Index corresponding to data requested to be writtenand provide the index Index to the storage system 200. The index Indexmay have information for identifying data Data, and as an example, anindex having a unique value for each data may be generated and providedto the storage system 200.

The index generator 110 may be implemented in various ways. For example,the index generator 110 may include an arithmetic circuit implemented byhardware. Alternatively or additionally, the index generator 110 may beimplemented by software that performs arithmetic functions. According toan embodiment, the index generator 110 may correspond to a hash enginethat calculates a hash value as an index through an operation using ahash function for the data Data. When the index generator 110corresponds to a hash engine, the index generator 110 may calculate ahash value by using various hash algorithms, such as GOST, HAVAL, MD2,MD4, MD5, PANAMA, RadioGatun, RIPEMD, RIPEMD-128/256, RIPEMD-160,RIPEMD-320, SHA-0, SHA-1, SHA-256/224, SHA-512/384, SHA-3, and/orWHIRLPOOL.

Upon a data write request, the storage system 200 may receive the dataData and receive the index Index generated from the data Data asinformation for identifying the data Data. During a data writeoperation, the controller 210 of the storage system 200 may map theindex Index to a physical address (e.g., a physical block address PBA)and store the data Data in a location corresponding to the mappedphysical address PBA. Mapping information between the index Index andthe physical address PBA may be stored and managed in the index table211.

According to an example embodiment, the host 100 may provide data Dataand an index Index corresponding thereto to the storage system 200 asinformation for data writing without executing or connecting with a filesystem for managing data on a file-by-file basis or for generating alogical address for the storage system 200. In addition, when thestorage system 200 includes a flash memory, the controller 210 mayinclude a Flash Translation Layer (FTL) to provide an interface betweenthe host 100 and the storage device 220, and a mapping between the indexIndex and the physical address PBA may be performed by an addressmapping operation through the FTL.

An operation example related to data deduplication according to anembodiment of inventive concepts will be described as follows.

As data writing is requested from an external system, the host 100 mayreceive data Data and a logical address (not shown) correspondingthereto from an external system. In addition, an index Index generatedby the index generator 110 may be stored in the host 100. As an example,the host 100 may include a memory (not shown), and the index Index maybe aligned with the received logical address and stored in the memory asa tree structure.

The storage system 200 may receive data Data and an index Index from thehost 100, may determine whether the data Data is a duplicate by usingthe index Index, and perform a deduplication process according to thedetermination result. As an example, the storage system 200 may comparethe index Index received from the host 100 with an index correspondingto data previously stored in the storage system 200, and may determinewhether the data Data is a duplicate, according to a comparison result.

To determine whether there is duplicate data, by using the index, when apreviously received index is stored in the index table 211, determiningwhether data is a duplicate may be possible, through a comparisonoperation between a received index and the index stored in the indextable 211. When an index that is the same as a received index is not inthe index table 211, the storage system 200 may store the data Data in alocation indicated by a physical address PBA newly mapped to a receivedindex Index. However, when an index that is the same as a received indexis in the index table 211, the storage system 200 may determine thatreceived data Data is duplicate data, and may perform a deduplicationprocess by not storing the received data. Data in the storage device220.

As data writing based on deduplication is performed, data previouslystored in the storage system 200 may be referred (or accessed) by aplurality of logical addresses from an external system. According to anexample embodiment, the storage system 200 may further store countinformation (e.g., a reference count) to manage the number of timespreviously stored data is referenced. For example, the reference countmay be stored and managed in the index table 211. According to anembodiment, when data Data requested to be written from the host 100corresponds to duplicate data, the storage system 200 may perform anupdate operation on the reference count, and provide the host 100 withan updated reference count along with information indicating thatdeduplication processing has been performed.

In an example embodiment, the physical address PBA and the referencecount may be aligned with the index Index in the index table 211 andstored in the index table 211. According to an embodiment, actualinformation of the index Index may not be stored in the index table 211.The determination of duplicate data described above may be performed invarious manners. For example, whether there is duplicate data may bedetermined by checking stored information (e.g., a physical addressand/or a reference count) aligned with received index Index.

According to the embodiment described above, the host 100 may directlyprovide index Index generated by processing data Data to the storagesystem 200 as information for writing/reading the data. Data, and thus,the amount of information that is managed for data deduplication may bereduced, and a memory space for storing information may be reduced inthe host 100. For example, in a conventional case, the host 100separately manages mapping information between an index and a file ID(or a logical block address for the storage system 200) through theoperation of a file system or the like. However, according to anembodiment of inventive concepts, the host 100 may process datadeduplication without executing a file system or storing and managinginformation according to a result of the executing.

FIGS. 2 and 3 are block diagrams showing specific implementations ofdata processing systems 300A and 300B. In the following embodiments, itis assumed that the above-described index generator corresponds to ahash engine for generating a hash value and the index is a hash index.In addition, the data processing system 300A may compress data from theoutside and may store compressed data. However, embodiments of inventiveconcepts are not limited thereto.

Referring to FIG. 2 , the data processing system 300A may include a host310A and a storage system 320A and may receive data Data andcorresponding logical addresses LBA from an external system (e.g., anexternal computing node). Assuming that first to third logical addressesLBA 1 to LBA 3 and data Data corresponding thereto have been previouslyprovided from an external system, mapping information between the firstto third logical addresses LBA 1 to LBA 3 and first to third hashindexes Hash Index 1 to Hash Index 3 may be stored in the host 310A, andmapping information between the first to third hash indexes Hash Index 1to Hash Index 3 and first to third physical addresses PBA 1 to PBA 3 maybe stored in the storage system 320A. Also, that a fourth logicaladdress LBA 4 and data Data corresponding thereto may be provided froman external system and a fourth hash index Hash Index 4 may be generatedfrom the data Data.

The host 310A may include a hash engine 311A as an index generator, andmay also include a memory 312A that stores generated hash indexes. Thehash engine 311A may generate a hash index having a plurality of bits inresponse to the data Data having a specific (or, alternatively,predetermined) size. The size of the data Data and the size of the hashindex may be variously defined. For example, a hash index of 128 bitsmay be generated from 4 kB of data Data.

According to an embodiment, the first to third hash indexes Hash Index 1to Hash Index 3 may be aligned with the first to third logical addressesLBA. 1 to LBA 3 and stored in the memory 312A. When a logical addressLBA for writing or reading data is received from the external system, apreviously stored hash index corresponding to the logical address LBAmay be read from the memory 312A.

In addition, the storage system 320A may include a memory 321A and acompressor 322A that compresses the data Data. As an example, thestorage system 320A may generate a physical address PBA corresponding toa received hash index, and the memory 321A may store mapping informationbetween the hash index and the physical address PBA as a table. As anexample, the first to third hash indexes Hash Index 1 to Hash Index 3and the first to third physical addresses PBA 1 to PBA 3 may be storedtogether in the memory 321A. Additionally or alternatively, the first tothird physical addresses PBA 1 to PBA 3 may be aligned with the first tothird hash indexes Hash Index 1 to Hash Index 3 and stored in the memory321A. In this case, the first to third hash indexes Hash Index 1 to HashIndex 3 may not actually be stored in the memory 321A.

According to an example embodiment, the memory 321A may further store areference count corresponding to a hash index. For example, the memory321A may further store reference counts Ref CNT 1 to Ref CNT 3corresponding to the first to third hash indexes Hash Index 1 to HashIndex 3. For example, in the embodiment of inventive concepts, thereference counts Ref CNT 1 to Ref CNT 3 may be managed in the storagesystem 320A.

An example of a data deduplication operation will be described below.

The host 310A may receive the fourth logical address LBA 4 and the dataData corresponding thereto, and the hash engine 311A may generate thefourth hash index Hash Index 4 from the data Data. Mapping informationbetween the generated fourth hash index Hash Index 4 and the fourthlogical address LBA 4 may be stored in the memory 312A. Also, the host310A may provide the fourth hash index Hash Index 4 and thecorresponding data Data to the storage system 320A in providing a datawrite request to the storage system 320A.

The storage system 320A may determine whether there is duplicate data,by using the fourth hash index Hash Index 4. For example, the storagesystem may determine whether any one of the first to third hash indexesHash index 1 to Hash Index 3 corresponding to previously stored data isthe same as the fourth hash index Hash Index 4. If the storage systemdetermines that a hash index that is the same as the fourth hash indexHash Index 4 is not present in the memory 321A, the storage system 320Amay map the fourth hash index Hash Index 4 to the fourth physicaladdress PBA 4 and store data Data in a location indicated by the fourthphysical address PBA 4. In addition, mapping information between thefourth hash index Hash Index 4 and the fourth physical address PBA 4 maybe stored in the memory 321A.

However, if the fourth hash index Hash Index 4 is the same as any onehash index (e.g., the first hash index Hash Index 1), a duplicatestorage of the data Data may be skipped through deduplication processingand data corresponding to the first hash index Hash Index 1 may bereferred to by the fourth logical address LBA 4. According to anembodiment, when deduplication processing is performed, the value of thefirst reference count Ref CNT 1 corresponding to the first hash indexHash Index 1 may be updated, and as an example, the value of the firstreference count Ref CNT 1 may increase by one. Also, according to anembodiment, an updated first reference count Ref CNT 1 may be providedfrom the storage system 320A to the host 310A, along with informationindicating that deduplication processing has been performed.

FIG. 3 illustrates a case where a compressor for compressing data Datais provided in a host. As shown in FIG. 3 , the data processing system300B may include a host 310B and a storage system 320B. The host 310Bmay include a hash engine 311B, a memory 312B, and a compressor 313B. Inaddition, the storage system 320B may include a memory 321B that storesmapping information between a hash index and a physical address. Thedata processing system 300B shown in FIG. 3 may also perform datadeduplication in the same manner as in the above-described embodiments.For example, the storage system 320B may use a hash index provided fromthe host 310B to determine whether there is duplicate data, and mayperform data deduplication according to a determination result.

A memory for storing various mapping information according to theabove-described embodiments may include or be implemented by varioustypes of memories, for example, dynamic random access memory (DRAM),static random access memory (SRAM), thyristor RAM (T-RAM), zerocapacitor RAM (Z-RAM), and/or twin transistor RAM (TTRAM).

FIG. 4 is a block diagram showing functions that are performed by a hostof a data processing system 400 according to an example embodiment ofinventive concepts.

Referring to FIG. 4 , the data processing system 400 may include a host410 and a storage system 420, and the host 410 may include variousmodules implemented by hardware and/or software. For example, the host410 may include a remote procedure call (RPC) module 411, a blockservice module 412, a deduplication management module 413, and/or acompression module 414.

The RPC module 411 may perform communication with another system oranother server. For example, the RPC module 411 may perform a functionof calling another server for data transmission/reception. The blockservice module 412 may perform a function for processing data managementon a block basis. The deduplication management module 413 may beprovided to perform, in the host 410, a part of a function for datadeduplication. For example, the deduplication management module 413 mayinclude a hash engine that generates a hash index by using data. Thededuplication management module 413 may also include a memory forstoring mapping information between a logical address from an externalsystem and the hash index. The compression module 414 may compress dataand provide compressed data to the storage system 420.

According to example embodiments of inventive concepts, a hash index maybe provided directly to the storage system 420 as information related tothe storage and reading of data. Accordingly, the host 410 does not needto execute a file system for managing data on a file basis or a blocklayer for managing data, of which the size varies through compression,in accordance with a size corresponding to a logical block. For example,memory resources of the host 410, which are desired in connection withdata deduplication, may be reduced, and system performance may beimproved since not executing at least some of the functions of the filesystem and/or block layer may be possible.

FIG. 5 is a block diagram of a storage system 500 according to anembodiment of inventive concepts. The storage system 500 may include acontroller and a storage device, and the configuration shown in FIG. 5corresponds to an embodiment of the controller.

Referring to FIG. 5 , the storage system 500 may include a centralprocessing unit 510, which is a processor, a host interface 520, amemory interface 530, and a working memory 540. According to an exampleembodiment, the storage system 500 may further include a hash engine550. The working memory 540 may store an index table 541, and in amodified embodiment, another memory in the storage system 500 may storethe index table 541.

The central processing unit 510 may control all operations of thestorage system 500 by executing machine-readable instructions includingvarious programs stored in the working memory 540. Software includingvarious programs related to data deduplication in addition to thefunction of the storage system 500 may be loaded in the working memory540. The working memory 540 may be implemented with or may includerandom access memory (RAM), read only memory (ROM), electricallyerasable programmable read only memory (EEPROM), a flash memory oranother memory.

According to the embodiment described above, a host may generate a hashindex for data from the external system and provide the hash index tothe storage system 500. The storage system 500 may further include ahash engine 550. According to an embodiment, the hash engine 550 mayfurther perform a hash operation on the hash index and/or data providedfrom the host. For example, as the storage system 500 is furtherprovided with the hash engine 550, the amount of information that isstored in the storage system 500 may be reduced.

More specific configurations and operations of inventive concepts willbe described with reference to FIGS. 5 and 6 . FIG. 6 is a block diagramshowing an example of various modules. The modules may include hardwareor software stored in the working memory 540 of FIG. 5 . When thestorage system 500 includes a flash memory, the various modules shown inFIG. 6 may be defined as those included in an FTL.

The index table 541 described above may be stored in the working memory540, and modules that perform various functions by the operation of thecentral processing unit 510 may also be stored in the working memory540. As an example, an address conversion module 542, a deduplicationcontrol module 543, and a data management module 544 may be furtherstored in the working memory 540.

The address conversion module 542 converts a hash index provided fromthe host into a physical address. As an example, mapping informationbetween the hash index and the physical address may be stored in theindex table 541. As in the example described above, the index table 541may store the physical address in a form aligned with the hash index,and the physical address corresponding to the hash index provided fromthe host may be read from the index table 541.

The deduplication control module 543 may perform various functions forpreventing data from being stored redundantly. For example, thededuplication control module 543 may check the hash index provided fromthe host and information stored in the index table 541 to determinewhether data requested to be written corresponds to duplicate data. Thededuplication control module 543 may also perform operations for storingreference count information in the index table 541 and managing thereference count information. For example, when there is a hash indexthat is the same as the hash index from the host, a reference countvalue corresponding to the same index may be updated, e.g. may beincreased.

The data management module 544 may perform various data managementoperations. For example, the data management module 544 may perform datamanagement operations in a flash memory. According to an embodiment, thedata management module 544 may perform data management operations byusing information related to data deduplication. For example, the datamanagement operations may be adjusted according to the value of areference count stored in the index table 541. For example, variousmanagement operations, such as data movement, backup, and garbagecollection may be performed based on the value of a reference count.

In the embodiments of FIGS. 5 and 6 , an example in which thededuplication processing according to the embodiments of inventiveconcepts is performed by software is shown. However, the embodiments ofinventive concepts are not limited thereto. As an example, at least someof the functions for deduplication processing may be implemented byhardware or a combination of hardware and software.

FIGS. 7A and 7B are diagrams showing examples of information managed ina host and a storage system, according to embodiments of inventiveconcepts. FIG. 7A illustrates a general case of determining dataduplication on the host side, and FIG. 7B illustrates an example ofdetermining data duplication on the storage system side, according to anembodiment of an inventive concept.

Referring to FIG. 7A, a host may receive logical addresses LBA 1 to LBA3 and corresponding data from an external system and generate indexes(e.g., hash indexes Hash Index 1 to Hash Index 3) for the data. Inaddition, mapping information between the logical addresses LBA 1 to LBA3 and the hash indexes Hash Index 1 to Hash Index 3 may be stored andmanaged in a memory.

In addition, the host may operate a file system to access a storagesystem. After performing deduplication, the host may store the hashindexes Hash Index 1 to Hash Index 3 and file information (for example,file identifications File ID 1 to File ID 3) corresponding thereto inthe memory. In addition, reference counts Ref CNT 1 to Ref CNT 3 may befurther stored corresponding to the hash indexes Hash Index 1 to HashIndex 3 and the file information File ID 1 to File ID 3. Also, the hostmay provide logical addresses LBA S1 to LBA S3 for the storage system tothe storage system via a file system, and the storage system may storemapping information between the logical addresses LBA S1 to LBA S3 andphysical addresses PBA 1 to PBA 3. That is, the host may perform datadeduplication by performing dual table management, and thus, the amountof memory usage may increase and information retrieval time mayincrease.

Alternatively, as shown in FIG. 7B, according to the embodiment ofinventive concepts, the host may only manage mapping information betweenthe logical addresses LBA 1 to LBA 3 from the external system and thehash indexes Hash Index 1 to Hash Index 3 corresponding thereto.Accordingly, the host does not need to manage a table in duplicate, andthe burden of processing information on the host side may be reduced.

The host may provide the hash indexes Hash Index 1 to Hash Index 3 tothe storage system, and the storage system may store and manage themapping information between the hash indexes Hash Index 1 to Hash Index3 and the physical addresses PBA 1 to PBA 3. The storage system mayfurther store reference counts Ref CNT 1 to Ref CNT 3 corresponding tothe hash indexes Hash Index 1 to Hash Index 3 and the physical addressesPBA 1 to PBA 3.

FIG. 8 is a block diagram showing an example of data writing and readingoperations of a data processing system 600 according to an embodiment ofinventive concepts. In the embodiment shown in FIG. 8 , a storage system620 is an SSD that includes a NAND memory and a key-value SSD isillustrated.

The data processing system 600 according to the embodiment may include aplurality of storage systems. One or more of the storage systems may beconfigured as a key-value storage, and the data processing system 600may be operated in a manner in which actual data is stored in anotherstorage system. For example, the key-value storage may be configuredwith a low-performance node having low computing power and an SSD, andexternal clients may access the key-value storage to request storage andreading of data. In this case, a storage system 620, which is one of thestorage systems, may receive a key as an index in the above-describedembodiment, and may also receive a value as data. Also, similar to theabove-described embodiment, the key may be generated through a hashoperation on data provided from an external system.

Referring to FIG. 8 , the data processing system 600 may include a host610 and the storage system 620. According to the embodiments describedabove, the storage system 620 may store and manage a key Key as an indexand perform data deduplication by using a stored key.

The host 610 may receive a data write request and/or a data readrequest. As an example of a write operation, the key Key may begenerated through a hash operation on data User Data, and a value Valuecorresponding to compressed data may be generated through compressionprocessing on the data User Data. In addition, a write request Put (Key,Value) and a read request Get (Key) may be generated through key andvalue interface command processing, and the generated requests Put (Key,Value) and Get (Key) may be provided to the storage system 620 throughan SSD device driver.

The storage system 620 may perform a write/read operation based ondeduplication by using the received key Key and the received valueValue. For example, the storage system 620 may determine whether data isa duplicate, by referring to the received key Key and information storedin an index table. For example, assuming that it is determined whetherthe received key Key is the same as a previously stored key, the storagesystem 620 may additionally store mapping information between thereceived key Key and a physical address PBA corresponding thereto in theindex table when there is no same key, and a value Value may be storedin a location indicated by the physical address PBA in a NAND memory. Asan example of storage, the received key Key, the value Value, andmetadata. Meta data corresponding thereto may be stored together in apage of the NAND memory.

According to an example embodiment, a reference count may be furtherstored in the index table. When there is the same key, a data writeoperation may be completed by updating a reference count correspondingto the same key without storing the key Key and the value Value in theNAND memory in duplicate. According to an embodiment, a hash operationprocess may be further performed in the storage system 620 with respectto the key Key and the value Value, and the sizes of the key Key and thevalue Value, which are stored in the storage system 620, may be furtherreduced.

As an example of a read operation, the host 610 may generate a key Keycorresponding to data User Data through a hash engine and provide thegenerated key Key to the storage system 620 as information for readingdata. As an example, a read request Get (Key) with a key Key may beprovided to the storage system 620 via an SSD device driver.

The storage system 620 may receive a read request Get (Key) for the dataUser Data corresponding to the key Key, determine a physical address PBAmapped to the key Key through the index table, and read data from alocation indicated by the physical address PBA. According to anembodiment, a value Value corresponding to data among information storedin the NAND memory may be read and provided to the host 610, and thehost 610 may restore data User Data by decompressing the value Value.

FIG. 9 is a flowchart showing a method of operating a host, according toan embodiment of an inventive concept.

Referring to FIG. 9 , the host receives a logical address LBA as anaddress on an external system side and data Data corresponding to thelogical address LBA (Operation S11). The host may generate an indexIndex through operation processing on the data Data and store thegenerated index Index in the host as in Operation S12. As an example,the host may generate a hash index by using a hash function, and maystore the generated hash index in a memory in alignment with the logicaladdress LBA.

The host may provide the data Data and the index Index correspondingthereto to a storage system (e.g., an SSD) as in Operation S13. Thestorage system may perform a write operation with data deduplicationapplied thereto in response to a write request from the host, and whendata deduplication is performed according to an embodiment, the host mayreceive reference count information from the storage system as inOperation S14. The received reference count information may be used forsubsequent data management operations.

FIG. 10 is a flowchart showing a method of operating a storage system,according to an embodiment of an inventive concept.

The storage system may receive data Data and an index Indexcorresponding thereto from a host as in Operation S21, and may searchinformation previously stored in an index table for data deduplicationas in Operation S22. According to the search result, information storedin alignment with the received index may be determined, or it may bedetermined whether there is the same index as the received index, and itmay be determined whether the data Data received from the host isduplicate data, based on the determination result.

If it is determined that the data Data received from the host is notduplicate data, the storage system may generate a physical address PBAcorresponding to the received index Index and store mapping informationbetween the index Index and the physical address PBA as in OperationS24. Since there is no same index, the data Data provided from the hostis first stored in the storage system, so that the data Data may bestored in a location corresponding to the physical address PBA as inOperation S25.

However, if the data Data received from the host is determined to beduplicate data, it may be determined that there is the same data as thedata Data provided from the host, and accordingly, the processing for awrite request may be completed without storing the data Data induplicate. As an example, a reference count corresponding to the sameindex as the index Index provided from the host may be updated as inOperation S26, and information about the updated reference count may beprovided to the host as in Operation S27. The update of the referencecount may be performed by increasing or decreasing the value of thereference count, and as an example, the reference count may beincremented by 1 each time a write request for the same index isreceived.

Hereinafter, with respect to data deduplication, various operationalexamples applicable to embodiments of inventive concepts are described.FIGS. 11 to 18 are diagrams showing examples of communication between ahost and a storage system in a data processing system according to anembodiment of inventive concepts. In the following embodiments, thestorage system is assumed to be a key-value SSD KV SSD, and a valueValue may be referred to as data.

In the example of FIG. 11 , a data deduplication operation is describedwithout considering data collision. For example, a key Key generatedusing a hash function or the like has a size smaller than that of thedata Value, and thus, a data collision, in which the same key isgenerated even though the data Value has a different value, may occur.

Referring to FIG. 11 , a host generates a key Key from data Value uponreceiving a data write request and provides a write request PUT (Key,Value) to a storage system. The storage system may perform a dataduplicate check operation using the key Key. When the data Valuecorresponds to duplicate data, the storage system may update only areference count Ref CNT without storing the data Value in duplicate. Thestorage system may provide the host with information Info_DD indicatingthat deduplication processing has been performed on the write requestPUT (Key, Value), and may also provide an updated reference count RefCNT to the host. The host may determine that data deduplication has beenperformed, based on the information Info_DD.

FIGS. 12 and 13 illustrate an example of a data deduplication operationconsidering data collision. FIG. 12 shows an example in which datacollision determination is performed on the host side, and FIG. 13 showsan example in which data collision determination is performed on thestorage system side.

Referring to FIG. 12 , as in the embodiment of FIG. 11 , a hostgenerates a key Key from data Value and provides a write request PUT(Key, Value) to a storage system. The storage system performs a dataduplicate check operation using the key Key. When the data Valuecorresponds to duplicate data, the storage system may read data Valuecorresponding to the received key Key and provide the host with the readdata Value in addition to information Info_D indicating that dataduplication has occurred.

The host may check if there is a data collision by comparing data Valuerequested to be written from an external system with data Value providedfrom the storage system. As an example, the host may determine whetherthere is the same data, by comparing, in units of bits or bytes, thedata. Value requested to be written from the external system with thedata Value provided from the storage system, and may provide adetermination result Res_C to the storage system.

According to a determination result Res_C indicating that the data Valuerequested to be written is the same as the data Value provided from thestorage system, the storage system may update only a reference count RefCNT without storing the data Value in duplicate and provide the hostwith information Info_DD indicating that deduplication has beenperformed. In addition, an updated reference count Ref CNT may befurther provided to the host.

When the data Value requested to be written is different from the dataValue provided from the storage system, a data collision may bedetermined to have occurred, and the host may perform a managementoperation for eliminating the data collision. The data collisionelimination may be performed according to various methods. For example,the host may generate a key Key′ having a different hash value forcollided data Value. According to an embodiment, the host may provide anew write request PUT (Key′, Value) to the storage system in the eventof data collision, and data collision may be prevented as the storagesystem stores the data Value in response to the write request PUT (Key′,Value).

Referring to FIG. 13 , as in the embodiments of FIGS. 11 and 12described above, a host generates a key Key through a hash function andprovides a write request PUT (Key, Value) to a storage system. Thestorage system may perform a data duplicate check operation using thekey Key.

When there is duplicate data according to the data duplicate checkoperation using the key Key, the data storage may determine whether adata collision occurs. For example, the storage system may read data.Value corresponding to the key Key and determine whether the read dataValue is the same as data Value provided from the host. If the read dataValue is the same as the data Value provided from the host, it may bedetermined that there is no data collision. Accordingly, the storagesystem may update only a reference count Ref CNT without storing thedata. Value in duplicate. In addition, the storage system may providethe host with information Info_DD indicating that deduplication has beenperformed and an updated reference count Ref CNT.

On the other hand, when that a data collision has occurred, the storagesystem may provide the host with information Info_C indicating that adata collision has occurred. The host may perform a management operationfor eliminating data collision, in substantially the same manner as inthe above-described embodiments. For example, the host may generate akey Key′ having a different hash value for collided data Value andprovide a new write request PUT (Key′, Value) to the storage system.

In the above-described embodiments, an example in which a new hashoperation is performed to eliminate data collision has been described.However, embodiments of inventive concepts are not limited thereto. Forexample, the data collision may be eliminated by writing collided datain the storage system and managing table information as a linked list.Also, the number of collisions may be counted and data collisionmanagement may be performed based on the number of collisions, andvarious management methods may be used to eliminate the data collision.

In drawings illustrating the following embodiments, data Value isassumed to correspond to “ABCD” and a key corresponding to “123” isgenerated from the data Value. FIG. 14 shows an example of the operationof a data processing system related to the management of referencecounts.

Referring to FIG. 14 , a host generates a key corresponding to “123”from the data Value and provides a write request PUT (123, ABCD) for thedata Value to a storage system. The storage system may determine thatreceived data Value does not correspond to duplicate data, and normallywrite the data Value in accordance with the write request PUT (123,ABCD) and also provide the host with a signal indicating completion ofwriting.

Thereafter, the host may provide a write request for duplicate data“ABCD” to the storage system. The storage system may determine that thereceived write request corresponds to a write request for duplicate datathrough a check operation using a key, and may provide the host withinformation Info_D indicating that data duplication has occurred. Thehost may provide the storage system with a read request GET 123including the key corresponding to “123”, in response to the informationInfo_D, and the storage system may read the data Value in response tothe read request GET 123 and provide the read data Value to the host.

The host may determine whether a data collision has occurred, by usingthe data Value read from the storage system in the same manner as in theabove-described embodiment. As a result of the determination, when thedata Value received from the storage system is the same as datarequested to be written, the host may request the storage system toincrease a reference count for the key corresponding to “123” as a datacollision has not occurred. The storage system may complete a writeoperation by increasing (or updating) the reference count for the keycorresponding to “123” in response to a request from the host.

According to the above-described embodiment, a reference count for keysmay be managed in the storage system, whereas an operation of adjustingthe reference count for the keys may be performed by the host. Forexample, upon a write request for duplicate data, whether a datacollision occurs may be determined by the host, and a reference count inthe storage system may be managed by the host based on the determinationresult.

FIG. 15 shows an example of an operation using reference countinformation in a storage system.

Referring to FIG. 15 , as in the above-described embodiment, a hostgenerates a key corresponding to “123” from data. Value “ABCD” andprovides a write request PUT (123, ABCD) for the data Value to thestorage system. The storage system may determine whether the data Valueis a duplicate, and may write the data Value or update only a referencecount without writing the data Value, according to a result of thedetermination. In addition, information Info_DD indicating thatdeduplication has been performed may be provided to the host.

The host may perform an overwrite or partial update operation onexisting data Value according to a request from an external system. Asan example, the host may provide a request PUT_OVERWRITE to overwritedata Value, which corresponds to a key corresponding to “123”, withanother value, or provide a request PUT_PARTIAL_UPDATE to update some ofthe data Value corresponding to the key.

The storage system may determine whether to perform the overwriterequest or partial update request from the host. For example, thestorage system may determine whether to perform the overwrite request orpartial update request by checking a reference count corresponding tothe key corresponding to “123”. As a result of checking the referencecount, if data corresponding to the key corresponding to “123” is datareferred to by a plurality of logical addresses on the external systemside, the storage system may provide the host with information Failedindicating that the data cannot be changed. The host may again determinewhether the data has been changed, based on the information Failed fromthe storage system.

FIG. 16 shows an example in which data deduplication is determined by ahost.

Referring to FIG. 16 , as in the above-described embodiment, the hostgenerates a key corresponding to “123” from data Value “ABCD” andprovides a write request PUT (123, ABCD) for the data Value to a storagesystem. The storage system may perform a duplicate check operation usingthe key and provide a determination result to the host. When the dataValue is not duplicate data, data requested to be written is normallywritten. When the data is duplicate data, information Info_D indicatingthat data duplication has occurred may be provided to the host.

The host may selectively perform data deduplication. For example, thehost may operate according to a data deduplication mode, and when thehost is in the data deduplication mode, the host may provide a datadeduplication request Req_Dedup to the storage system. The storagesystem may update a reference count corresponding to the key in responseto the data deduplication request Req_Dedup and provide the host withinformation Info_DD indicating that data deduplication has beenperformed. On the other hand, when the host is not in the deduplicationmode, the host may provide a data duplication request Req_Dup to thestorage system, and the storage system may store data Value in duplicateand provide the host with information indicating that the data Value hasbeen normally stored.

The application of the deduplication as described above may be performedbased on various criteria. For example, the host may determine theimportance of the data Value according to the data Value, and mayrequest that relatively significant data (or data requiring storagestability) be stored redundantly. Accordingly, data deduplication may beapplied to relatively insignificant data.

According to an embodiment, the storage system may provide a referencecount stored therein to the host, and the host may determine the valueof the reference count in applying data deduplication. As an example,when the value of a reference count corresponding to a specific key isrelatively large, it may indicate that data corresponding to thespecific key is frequently referred to. In this case, the host maydetermine whether the value of the reference count corresponding to thespecific key exceeds a threshold. In an operation of writing the datacorresponding to the specific key, data deduplication may or may not beapplied according to a result of comparison between the value of thereference count and the threshold.

FIG. 17 shows an example of an operation using a reference count in adata processing system. A detailed description of the same features asthose described in the preceding embodiments among the features shown inFIG. 17 is omitted.

Referring to FIG. 17 , a data processing system according to anembodiment of an inventive concept may include a host and a storagesystem, and a reference count corresponding to a key may be stored inthe storage system. The storage system may also provide the referencecount to the host. The reference count may be provided from the storagesystem to the host in various manners. As an example, whenever datadeduplication is applied, the reference count corresponding to the keymay be provided to the host. Alternatively, regardless of datadeduplication, a reference count stored in a memory in the storagesystem periodically or aperiodically may be provided to the host.

The host may store the reference count provided from the storage systemand may determine the reference count to perform a data managementoperation. For example, for a key having a relatively large referencecount (or exceeding a threshold value), the host may determine that datacorresponding to the key is of high significance, and perform amanagement operation according to the data. In an embodiment, the hostmay provide a backup request for the data corresponding to the key tothe storage system. Alternatively, the host may request the storagesystem to move the data corresponding to the key to a reliable storagearea (e.g., a single level cell (SLC) area in a NAND memory). Thestorage system may perform operations according to a management requestfrom the host.

FIG. 18 shows an example of an operation using a reference count in adata processing system. A detailed description of the same features asthose described in the preceding embodiments among the features shown inFIG. 18 is omitted.

An example, in which a reference count is stored in a storage system,whereas the value of the reference count is provided from a host to thestorage system, is shown in FIG. 18 . In addition, although an examplein which an operation for determining whether data is a duplicate ordetermining where there is a data collision is performed in the host isshown in FIG. 18 , the embodiment of inventive concepts is not belimited thereto. For example, the current embodiment may be applied evenwhen data collision management is not performed, or data collisionmanagement is performed on the storage system side.

Referring to FIG. 18 , the host may generate a key corresponding to“123” from data Value “ABCD”, and the storage system may store the dataValue as a write request PUT (123, ABCD) for the data Value is providedto the storage system. Thereafter, when the same key (for example,“123”) is generated, the host may determine that data duplication hasoccurred, and may provide the storage system with a request for readingdata Value corresponding to the key corresponding to “123”. The storagesystem may provide the host with a reference count (e.g., 1)corresponding to the key, along with the data Value corresponding to thekey.

The host uses the data Value received from the storage system todetermine whether a data collision occurs. If it is determined that adata collision has occurred, the host may perform a data collisionmanagement operation according to the above-described embodiments. Onthe other hand, if it is determined that a data collision has notoccurred, the host may again provide a data write request to the storagesystem and may change the reference count corresponding to the key to avalue of 2 and provide the value of 2 to the storage system. The storagesystem may perform a write operation based on data deduplication. Forexample, the storage system may update the reference count correspondingto the key from 1 to 2 according to information provided from the hostwithout writing duplicate data.

FIG. 19 is a block diagram of a network system 700 including a serversystem 710 according to an embodiment of an inventive concept. Aplurality of terminals 731_1 to 731_n (e.g., computing nodes) inaddition to the server system 710 are shown in FIG. 19 , and the serversystem 710 may be implemented using the data processing system accordingto the above-described embodiments.

Referring to FIG. 19 , the network system 700 may include the pluralityof terminals 731_1 to 731_n which communicate with the server system 710via a network 720. The server system 710 may include a server 711 and anSSD 712 as a storage system. The server 711 may perform functions of thehost in the above-described embodiments.

The server 711 may process requests transmitted from the plurality ofterminals 731_1 to 731_n connected to the network 720. As an example,the server 711 may store data provided from the plurality of terminals731_1 to 731_n in the SSD 712. In addition, according to theabove-described embodiments, the server 711 may reduce or prevent theduplicate storage of the same data by using a data deduplicationfunction, and thus, a storage space of the SSD 712 may be efficientlyused.

The SSD 712 may manage a hash index and a reference count according tothe embodiments described above. According to an embodiment, the SSD 712may receive data and a hash index corresponding thereto from the host,determine whether the data is a duplicate, by using the hash index, andperform an operation for storing the data and updating the referencecount, according to a determination result. In addition, when the valueof a reference count corresponding to a specific key is large, datacorresponding thereto may correspond to data used in one or more of theplurality of terminals 731_1 to 731_n, and a data management operationusing a reference count may be performed according to above-describedembodiments.

While inventive concepts have been particularly shown and described withreference to embodiments thereof, it will be understood that variouschanges in form and details may be made therein without departing fromthe spirit and scope of the following claims.

What is claimed is:
 1. A storage system comprising: a storage deviceconfigured to store data received from a host; and a controllerconfigured to concurrently receive, from the host, both the data and anindex, the index associated with all of the data received from the host,the index determined by a hash-engine within the host, the controllerincluding a memory configured to store mapping information and areference count, the mapping information associating the index receivedfrom the host with a physical address of the storage system, thereference count associated with the index received from the host, thecontroller being configured to determine whether the data received fromthe host corresponds to a duplicate of data previously stored in thestorage device by reading, from the memory, at least one of the mappinginformation and the reference count, the reading based on the indexreceived from the host, and the controller being configured to perform adeduplication process by updating the reference count in response to thedata received from the host corresponding to the duplicate of datapreviously stored in the storage device, wherein the storage device is akey-value storage device configured to store the data received from thehost as a value and is configured to store the index received from thehost as a key associated with the value, and the index received from thehost is a hash generated through a hash function of the data.
 2. Thestorage system of claim 1, wherein in response to an index associatedwith the data previously stored in the storage device being the same asthe index received from the host, the controller is configured todetermine that the data received from the host corresponds to theduplicate of data previously stored in the storage device.
 3. Thestorage system of claim 2, wherein the memory is configured to storefirst to Nth reference counts corresponding to first to Nth indexes(where N is an integer equal to or greater than 2), and the controlleris configured to perform the deduplication process by increasing thefirst reference count corresponding to the first index in response tothe index received from the host being the same as the first index. 4.The storage system of claim 1, wherein the controller is configured toprovide the updated reference count to the host.
 5. The storage systemof claim 1, wherein the controller is configured to provide the hostwith first information indicating that the data received from the hostis the duplicate of data previously stored in the storage device, and isconfigured to receive a reference count update request from the host andperform an update on the reference count in response to the referencecount update request.
 6. A method of operating a storage system, themethod comprising: concurrently receiving, from a host, first data and afirst index, the first index being associated with all of the firstdata, the first index determined by a hash engine within the host;determining whether the first index is the same as an indexcorresponding to data previously stored in the storage system;performing a data deduplication by updating a reference count withoutwriting the first data in response to determining that the first indexis the same as the index corresponding to data previously stored in thestorage system, the reference count being previously stored in thestorage system; and providing the updated reference count to the host,wherein the storage system includes a key-value storage deviceconfigured to store the data received from the host as a value and isconfigured to store the index received from the host as a key associatedwith the value.
 7. The method of claim 6, wherein the storage system isconfigured to store second to Nth indexes and second to Nth referencecounts corresponding thereto (where N is an integer equal to or greaterthan 3), and the method further comprises: increasing the secondreference count corresponding to the second index in response to thefirst index being a same as the second index.
 8. The method of claim 7,wherein the providing the updated reference count comprises providingthe host with the second reference count.
 9. The method of claim 6,further comprising: newly mapping the first index to a first physicaladdress in response to determining that the first index is not the sameindex as an index corresponding to data previously stored in the storagesystem; writing the first data in a location indicated by the firstphysical address in response to determining that the first index is notthe same index as an index corresponding to data previously stored inthe storage system; and storing the first physical address and the firstreference count corresponding to the first index in response todetermining that the first index is not the same index as an indexcorresponding to data previously stored in the storage system.
 10. Themethod of claim 6, further comprising: providing the host withinformation indicating that the first data is duplicate data in responseto determining that the first index is the same index as an indexcorresponding to data previously stored in the storage system; andreceiving at least one of a deduplication request or a duplicate storagerequest from the host in response to determining that the first index isthe same index as an index corresponding to data previously stored inthe storage system, wherein the data deduplication is performed inresponse to the deduplication request from the host, and the first datais stored in duplicate in response to the duplicate storage request fromthe host.
 11. The method of claim 6, further comprising: reading datacorresponding to the same index in response to determining that thefirst index is the same index as an index corresponding to datapreviously stored in the storage system; and deciding whether a datacollision occurs by comparing the read data with the first data inresponse to determining that the first index is the same index as anindex corresponding to data previously stored in the storage system,wherein the data deduplication is performed in response to the decidingwhether a data collision occurs decides that the data collision has notoccurred.
 12. The method of claim 11, further comprising, providinginformation indicating that the data collision has occurred to the hostin response to the data collision having occurred; receiving the firstindex which has been changed corresponding to the first data; andwriting the first data in a location indicated by a first physicaladdress newly mapped to the first index which has been changed.
 13. Themethod of claim 6, further comprising: providing the host with firstinformation indicating that the first data is duplicate data in responseto determining that there is the same index as the first index; andreceiving, from the host, second information indicating a changedreference count in response to determining that there is the same indexas the first index, wherein the updating the reference count comprisesupdating the reference count to correspond to the second informationprovided from the host.
 14. A method of operating a data processingsystem including a storage system, the method comprising: storingmapping information in the storage system, the mapping informationincluding a mapping between an index generated using data from anexternal system and a physical address indicating a storage location ofthe data; receiving in the storage system a write request includingadditional data and an index corresponding to all of the additionaldata, the index determined by the external system; determining, in thestorage system, whether the additional data corresponds to a duplicateof data already stored in the storage system; performing a deduplicationprocess by updating a reference count stored in the storage system inresponse to the additional data corresponding to the duplicate of dataalready stored in the storage system; and providing the updatedreference count from the storage system to a host, wherein the storagesystem includes a key-value storage device configured to store the datareceived from the host as a value and is configured to store the indexreceived from the host as a key associated with the value.
 15. Themethod of claim 14, wherein the data processing system further includesa host, and the method further comprises: generating, in the host, theindex through a hash function of the additional data; and providing, tothe storage system, the additional data and the index corresponding tothe additional data.
 16. The method of claim 15, further comprising: inthe host, compressing, the additional data received from the externalsystem.
 17. The method of claim 15, further comprising: providing firstinformation from the storage system to the host, the first informationindicating that the write request is a write request for the duplicateof data previously stored in the storage system; and providing secondinformation, the second information requesting the deduplicationprocess, the second information provided from the host to the storagesystem, wherein the storage system is configured to perform thededuplication process in response to the second information.
 18. Themethod of claim 15, further comprising: reading data, corresponding tothe index, from the storage system; providing the read data to the host,in response to determining that the write request is a write request forduplicate data; and determining, in the host, whether a data collisionoccurs by comparing the data with the read data.
 19. The method of claim15, further comprising: in the host, checking a reference countcorresponding to a plurality of indexes provided from the storagesystem; and providing the storage system with a backup request for datacorresponding to a reference count exceeding a threshold, based on aresult of the checking.